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Background: Malaria transmission is measured using entomological inoculation rate (EIR), number of 
infective mosquito bites/person/unit time. Understanding heterogeneity of malaria transmission has been 
difficult due to a lack of appropriate data. A comprehensive entomological database compiled by the Malaria 
Transmission Intensity and Mortality Burden across Africa (MTIMBA) project (2001-2004) at several sites is 
the most suitable dataset for studying malaria transmission-mortality relations. The data are sparse and large, 
with small-scale spatial-temporal variation. 

Objective: This work demonstrates a rigorous approach for analysing large and highly variable entomological 
data for the study of malaria transmission heterogeneity, measured by EIR, within the Rufiji Demographic 
Surveillance System (DSS), MTIMBA project site in Tanzania. 

Design: Bayesian geostatistical binomial and negative binomial models with zero inflation were fitted for 
sporozoite rates (SRs) and mosquito density, respectively. The spatial process was approximated from a subset 
of locations. The models were adjusted for environmental effects, seasonality and temporal correlations 
and assessed based on their predictive ability. EIR was calculated using model-based predictions of SR and 
density. 

Results: Malaria transmission was mostly influenced by rain and temperature, which significantly reduces 
the probability of observing zero mosquitoes. High transmission was observed at the onset of heavy rains. 
Transmission intensity reduced significantly during Year 2 and 3, contrary to the Year 1, pronouncing high 
seasonality and spatial variability. The southern part of the DSS showed high transmission throughout the 
years. A spatial shift of transmission intensity was observed where an increase in households with very low 
transmission intensity and significant reduction of locations with high transmission were observed over 
time. Over 68 and 85% of the locations selected for validation for SR and density, respectively, were correctly 
predicted within 95% credible interval indicating good performance of the models. 

Conclusion: Methodology introduced here has the potential for efficient assessment of the contribution of 
malaria transmission in mortality and monitoring performance of control and intervention strategies. 
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Malaria is still endemic in more than 100 
countries worldwide, leaving children and preg- 
nant mothers being the most vulnerable groups 
for infections (1). Global estimates report 219 million 
malaria cases (range 154-289 million) with about 660 
thousands deaths (range 610-971), most of these ( ~ 90%) 
occurring in Africa. The impact of the malaria burden on 



the achievement of Millennium Development Goals is 
enormous, and its control is a potential contribution 
towards significant progress (1). 

Malaria is transmitted by female Anopheles mosqui- 
toes. The transmission intensity is therefore highly sensi- 
tive to environmental variations that affect the densities 
of these vectors and their ability to transmit the infection 
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(2-4). Up to 10-fold variations in transmission intensity 
have been observed within very small localities due to 
geographical, biological or socio-economic factors (5-8). 
Understanding the heterogeneity in transmission and 
human exposure to malaria infection is critical for 
optimizing control programs and targeting interventions 
(9-12). 

Malaria disease burden and transmission can be 
assessed using incidence or prevalence in human hosts. 
However, the entomological inoculation rate (EIR) most 
directly quantifies the exposure of the human popula- 
tion to the infectious stages of the parasite (12-16). EIR 
is the product of the human-biting rate, for example, 
mosquito bites/person/night (which can also be estimated 
using mosquito density) and the sporozoite rate (SR), 
which is the proportion of infective mosquitoes (7, 17). 
The measure expresses the average number of infective 
bites a person receives in a specified unit of time. It can 
be also used to predict other measures of transmission, 
which are used to evaluate effectiveness of malaria 
control program (5, 18). Uncertainty due to small sample, 
low values and variability in the SR and cost complicate 
precise estimation of EIR requiring standardized ento- 
mological surveys conducted over large areas (5, 6, 13, 
14). Accurate estimation of EIR requires longitudinal 
surveys within the study area to take into account spatio- 
temporal variations and seasonality trends. However, 
there is a paucity of this type of data due to cost and 
resources needed to collect them (19-21). 

The Malaria Transmission Intensity and Mortality 
Burden across Africa (MTIMBA) project was initiated 
by the INDEPTH Network (22, 23) and conducted over a 
period of 2001-2004 in several countries in Africa in- 
cluding Tanzania, Kenya, Mozambique, Senegal, Ghana 
and Burkina Faso. The main objective of the initiative 
was to assess the relation between the intensity of malaria 
transmission and all-cause as well as malaria-specific mor- 
tality across Africa, taking into account the influence of 
malaria control activities. The MTIMBA entomological 
data have been collected fortnightly over large number 
of locations (households) and to date this is the only 
available entomological database appropriate to study 
space-time heterogeneity of malaria transmission in 
Africa. These data are sparse with seasonal variations 
and spatio-temporal correlations. High dependence of 
climate, environment and ecological factors in the life 
of mosquito and seasonality any of the survey locations 
had zero mosquitoes or proportion of infected ones. 
In standard modelling approaches, EIR is treated as 
a continuous outcome, logarithmically transformed to 
fulfil the assumption of normality (21, 24-27). However, 
when EIR is estimated as a product of the SR and 
mosquito density, which are generated from the binomial 
and a count distribution like Poisson or negative binomial, 
respectively, normality assumptions are void. To our 



knowledge, Kasasa et al. (28) is the only literature report 
analysis of EIR data considering the two sources of data 
separately. In addition, due to the amount of zeros which 
is larger than what can be generated by the standard 
distributions, the data are over/under dispersed and zero 
inflated (21 , 29-32). Statistical analysis which accounts for 
these characteristics is essential to obtain unbiased esti- 
mates for the regression coefficients (33-36). 

Moreover, the MTIMBA-EIR data have been collected 
at fixed locations and they are typically geostatistical 
data. Similar exposures of environmental and climatic 
conditions to locations which are geographically close 
introduce spatial correlation between them. Geostatistical 
models take into account spatial correlation by introdu- 
cing location-specific random effects as latent observa- 
tions from a multivariate spatial Gaussian process (37). 
Spatial correlation between any pair of locations is often 
considered as a function of distance on the covariance 
matrix of the process. These models have a large number 
of parameters. Bayesian formulations (38) allow model fit 
via Markov Chain Monte Carlo (MCMC) simulation 
methods (39). However, the estimation process involves 
covariance matrix computations which are infeasible 
when the number of locations is too large (40, 41). 
A computational flexible way to overcome this problem 
is the approximation of the spatial process from a subset 
of locations using properties of conditional multivariate 
Gaussian distribution of the process (40-42). Most of 
these techniques have been applied in simulated data, 
observed in regular grid and mainly with Gaussian char- 
acteristics. In this study, selection of subset of locations 
is implemented using methods proposed in our previous 
work (40, 43). 

We now demonstrate a rigorous modelling way of 
analysing large spatio-temporal EIR data and study the 
heterogeneity, space and temporal patterns of malaria 
transmission within one MTIMBA site, the Ruffji DSS 
area in Tanzania (44). The Gaussian process approxima- 
tion proposed by Banerjee et al. (40) is applied to 
binomial (SRs) and negative binomial (density) data 
with zero inflation. The models are fitted using Bayesian 
MCMC simulation and assessed on the basis of their 
predictive ability. Model-based predictions of SR and 
density were multiplied to compute EIR. Model formula- 
tion details are given in the methodology section and 
selected results are presented afterwards. The discussion 
and conclusion of the findings consider the implications 
for timing and allocation of resources for malaria 
interventions. 

Methodology 
Study site 

The study utilized data collected from one of the 
MTIMBA sites in Tanzania, the Rufiji DSS (RDSS). 
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The RDSS is located in Rufiji District, Coast Region, 
Tanzania, about 178 km south of Dar es Salaam. 
The RDSS area extends from 7.47° to 8.03° south 
latitude and 38.62°-39.17° east longitude and operates 
in six contiguous wards and 3 1 villages. The surveillance 
area covers an area of 1,813 km 2 and monitors 85,000 
people, which is about 47% of the total population of 
the Rufiji District (INDEPTH Monogram). Rufiji Dis- 
trict has an overall mean altitude of <500 metres. Its 
vegetation is mainly formed of tropical forests and 
grassland. The district has hot weather throughout the 
year and two rainy seasons: short rains (October- 
December) and long rains (February-May). The average 
annual precipitation in the district is between 800 and 
1,000 mm. A prominent feature in the District is the 
Rufiji River with its large flood plain and delta, the 
most extensive in the country (INDEPTH Monogram; 
Rufiji DSS Profile, 2000). The majority of the people in 
the Rufiji District are subsistence farmers. 

The main responsible malaria vectors in the area include 
A. funestus, and members of the A. gambiae complex, 
including A. gambiae (sensu stricto) and A. arabiensis. 
Mosquito populations usually peak during the rain 
seasons especially in areas where rice cultivation is taking 
place and during the dry months, a high population 
was usually observed in areas with permanent water 
bodies (23). 

Mosquito data 

The entomological data were collected for the period 
of 3 years, October 2001 -September 2004 (Source: http:// 
www.indepth-network.org/dss_site_profiles/rufiji.pdf). The 
MTIMBA entomological protocol has been well defined 
in MTIMBA documentation (unpublished). In a snap- 
shot, mosquitoes were captured at least twice every 
month using Centers for Disease Control (CDC) minia- 
ture light traps. The human population in the RDSS was 
classified into geographical clusters (100-1,000 people), 
then for each round a simple random sampling (without 
replacement) was employed within clusters to select 
between 20 and 100 'index' people (households) for the 
set-up of mosquito catches (traps). The traps were fitted 
indoors with incandescent bulbs and laid close to a 
human volunteer (randomly selected from members 
of the household) sleeping under an untreated bednet. 
Light traps operated from sundown to sunrise (i.e. 6 pm- 
6 am) for two consecutive nights in each household 
and bags were emptied every morning. A total of 2,479 
unique locations (households) involved were geo-refer- 
enced. Collected mosquitoes were counted and sorted 
into vector species to allow for separate assessment of 
transmission intensity. 

Environmental and climatic data 

Remote sensing data were extracted from different 
sources with different spatial, Sp R , and temporal, T R , 



resolutions. These include normalized difference vegeta- 
tion index (NDVI) (Sp R : 250 m 2 ; T R : 16 days; Source: 
MODIS), day and night temperature (Sp R : 1 km 2 ; T R : 8 
days; Source: MODIS), rainfall (Sp R : 8 km 2 ; T R : 10 days; 
Source: ADDS) and distance to the nearest water bodies 
(Sp R : 1 km 2 ; Source: Health Mapper). 

Statistical analysis 

Geostatistical zero inflated negative binomial and logistic 
regression models were fitted on the mosquito density and 
SR data, respectively. The models accounted for the effect 
of environmental and climatic predictors, annual trends, 
seasonal patterns, and spatial and temporal correlations. 
The predictive process was used to approximate the spatial 
process using a subset of locations. Model-based predic- 
tion of SR and density were multiplied to obtain esti- 
mates of monthly and annual EIR. Details of the model 
formulation and its implementation are described in the 
subsections below. Programs used for this analysis are 
available via contact with the corresponding author. 

Model formulation for density data 

Let Y it be the number of female mosquitoes and X*, 1 ' be a 
vector of environmental predictors (extracted from satel- 
lite data) observed at location s„ i = 1,. . .,«, and calendar 
month t = 1,. . .,36 for a specific species. Y it is assumed to 
follow a negative binomial distribution, Y it ~NB(r, p it ), 
where p it =r/(r + fi it ). r is an over-dispersion parameter 
and Hi, is the mean mosquito density. Covariates X-, 1 ', 
seasonal trends f{tf x \ spatial Iff 1 ', temporal = 
(e 1 ,e 2 ,- ■ -,e t ) and non-spatial (jrp random effects are 
introduced on the log scale of the mean count via 
the equation log(ji it ) =X T(1) p <1) +/(0 (1> + C4 1) + e < / ) + <P?\ 
where p (1) is the vector of regression coefficients, (j>\ ' is 
a residual error term capturing the remaining variability 
in the data. /(/) is modelled via trigonometric function 
with a mixture of cycle, C 

m =£ cosh + a_> * }, 

C = 2;t= 1,..., 12/36 

where T c is the period of the season for cycle C (i.e. 
Ti =12 and 7_=6) and 8$ and <5_' are regression 
parameters used to describe the amplitude and phase 
within a period (45, 46). Separate models were fitted 
assuming: (i) a constant seasonal pattern across the 
3 years of the study by taking ? = 1,...,12; or (ii) a 
continuous time for the entire study period by taking 
t = l,. ..,36. The seasonal pattern considering dry/wet 
categorization of the data was also assessed. 

A zero inflated model formulation was adopted to 
take into account the excess zeros in the count data. The 
model is defined as a mixture of a degenerate distribu- 
tion with mass at zero and a non-degenerate count 
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distribution. The log-likelihood is therefore a sum of the 
log-likelihood for the non-zero and the zero counts. The 
distribution of the data is now defined as: 

P(Y = 0\p*, ff)=p* + (l- p*)n{0\6) 
P(Y=y\p*,9) = {\-p*)n(y\&),y>0 

where p* is the probability for a count to arise from the 
zero mass and 1-p* is the probability to observe a sample 
from a count distribution (i.e. n(y 1 8) = NB for our case, 
and 9 is the vector of parameters associated with the 
distribution). This probability can be assigned a value 
between 0 and 1, usually approximates the proportion 
of zero counts in the sample or can be a function of 
covariates similar or different from those used in the 
full model (21, 33, 34, 47). Involving possible sources of 
zero inflation (e.g. covariates) reduces bias in parameter 
estimation of p* and other sources of uncertainty. In 
our case p* is modelled with a logit link as a function 
of all climatic predictors X* observed at location s h i.e. 
logit(p*) = X* T a, where a is the corresponding vector of 
regression coefficients. 

Bayesian model formulation requires the specifica- 
tion of prior distributions for all unknown parameters. 
For the regression coefficients, p (1) , d m and a, a standard 
non-informative uniform prior is adopted, i.e. P (1) ~ 
Unif(-oo,oo), <5 (1) ~Unif(-oo,oo) and a ~Unif(-oo,oo), 
respectively. The latent observations U- 1 ' introduced at 
each location s, are assumed to be derived from a multi- 
variate normal distribution with a covariance matrix Z^ K , 
i.e. U (1) ~ MVN(0, £^J. The is a matrix with elements 
S,j and quantify the covariance Cov(Ui,U/) between the 
pair of locations Si and Sj, respectively. Its distribution 
defines the Gaussian spatial process. Under the assump- 
tion of stationarity, the spatial correlation is taken to be 
a function of distance between locations. An exponential 
correlation structure for the covariance matrix of the 
spatial process is adopted, that is = a 2 ^ exp(— d i} 
where ci' is the spatial variance, dy is the distance between 
locations St and Sj and p w measuring the correlation 
decay and also known as the effective range (3//j (1> ) 
and estimates the distance where the spatial correla- 
tion is < 5%. The decay parameter p fl) assumed to follow 
a gamma distribution. 

Computation of the Gaussian process requires the 
inversion of the covariance matrix, which for a very 
large number of locations is not feasible. To enable model 
fit we approximate the spatial process by a subset of 
locations, knots, {s*,i = 1,. . .,m} Qn< <«) with latent 
observations lf u) ={U(s x *),. . .,t/(s m *)) r . lf {1) is consid- 
ered to arise from the same Gaussian process as and 
thus lf w ~N{Q£*), where E* is the mxm covariance 
matrix of the sub-process. These latent observations U of 
the original process can be approximated by the 'predictions' 
of the sub-process via the mean of Gaussian conditional dis- 



tribution U {1 \s)\ir tn ~N(Q T I,*- 1 U*" , ,<j 2 - Q T 'E t - 1 Q), 
that is U = Q_ T ir- l V { '\ where Q = Cov(U*"\ U {1) ) is an 
mxn matrix of the covariance functions between the 
full and the sub-process (48, 49). Selection of subset of 
location was done using the minimax space filling design 
implemented in R software (50). The approach optimizes 
the selection of the best subset by minimizing the maxi- 
mum of the nearest-neighbour distance between the 
original survey and the subset locations. 

The 4 model temporal correlation via a statio- 
nary autoregressive process of order one, i.e. e x ~ 
Normal (0 , a 2 ^ 1 /(l — y 2 )) and e t \e i r _i~ Normal 
e t-u ff r' 1 )i 1 — 2, where f s a n autocorrelation para- 
meter I"/ 1 ' I < 1 which adopts a bounded uniform distri- 
bution, y (1 ' ~ Unif[— 1, 1] and a 2 j ] is the temporal error 
(51). The is assumed to follow a normal distribution 
with mean zero and a homoscedastic variance of" 1 . Inverse 
gamma priors are adopted for the variance parameters of'", 

2(1) j 2<'> 

a T and a e . 
Model for SR 

Let Nu and Z u be the number of mosquitoes tested 
and number infected, respectively at location s, and 
calendar month t. Z tt is assumed to arise from a binomial 
distribution, Z, 7 ~ Bin(N it ,n it ) , where n it measures the 
SR at location and time t. The regression function 
links the SR with other terms of the model (as shown 
for the density data) and is given as logit 
(n k )=X T ^^+f(tf ) + U i i 2) + E ? ) + 4> ( i 2) . A similar 
specification described for the density model is followed 
in this model. 

Data management and environmental lags 

To facilitate the assessment of the seasonal pattern, data 
were summarized by location and calendar month. That 
implies that all repeated surveys from a specific location 
within the same month were collapsed (sum of mosquito 
density/tested and positive) to a single observation. 

To account for the environmental-lag-effect on mos- 
quito density or SR, non-spatial (negative) binomial 
models (with/without zero inflation) were fitted and 
best lags were assessed. Lags refer to a climate/ environ- 
ment value at different time intervals prior to the study 
date that might influence the amount of mosquitoes 
collected or the SR. Lags considered include the current 
month (month of collection of mosquitoes); 1/2/3 
month(s) prior to the collection; average of current and 
one previous month; average of one and two previous 
months; and lastly average of current, one and two 
previous months. The analysis took into account season- 
ality, distance from water bodies and time (annual effect) 
which was incorporated as a binary variable indicating 
the year of study. Analysis was conducted separately for 
each species. Fitted values from models with all possible 
combinations of the environmental lags were calculated 
and plotted against the observed values (mosquito counts 
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or SR). The combination which best fits the data was 
used for further analysis. This was implemented in 
STATA 10 (Stata Corps). 

Model validation and prediction 

Models were fitted using a training set (85% of the data) 
randomly selected from the entire data. Validation of the 
model performance was done on the test locations (the 
remaining 15% of the data) The predictive ability of 
the model was assessed by specifically calculate different 
credible intervals with different probability coverage of 
the posterior predictive distribution and compare the 
percentage of test locations correctly predicted within 
these credible intervals (52). The best predictive ability of 
the model is observed when higher the number of test 
locations falls within the narrowest credible interval. The 
predicted power of the model at 95% credible interval is 
reported. 

Using the estimates obtained from the models, SR and 
mosquito density were predicted for the whole Rufiji site. 
The prediction was done at the 250 m resolution. 

Calculation of EIR 

The EIR can be estimated as a product of the SR and 
human-biting rate. Depending on the mosquito collec- 
tion method used (human landing, light trap, etc.), the 
human-biting rate can be correctly approximated either 
by the number of blood meals taken on humans/mosquito/ 
day or by the mosquito density. Established correla- 
tion between number of mosquitoes captured by light 
traps and human landing catches is usually used to 
adjust light trap collection to equivalence of biting catches 
and avoid collection bias (53). For this study, EIR was 
calculated as a product of SR and mosquito density 
and then adjusted using a correction factor of 1.605 to 
calibrate estimates obtained from light trap collection 
(28, 53, 54). 



At a specific pixel j and month / the predicted values 
of SR, tu and mosquito density, /L were obtained for A. 
funestus and A. gambiae species. EIR estimates represent- 
ing the infectious bite/person/day were calculated as: 

EIR Jt = 1.605 * ((£, v * %J + * 
where 1.605 is the correction factor. 

A 

The EIR jt was then multiplied by 30.5 and 365 to 
obtain monthly and annual estimates, respectively. 
Monthly and annual maps were produced to show 
seasonal and temporal trends of the transmission. 

Geostatistical model implementation 

The final model was implemented in OpenBUGS and 
parameters were estimated using the Gibbs sampler 
MCMC algorithm. The spatial variance parameter was 
sampled directly from its inverse gamma full conditional 
distributions using Gibbs sampling (39). The remaining 
parameters were simulated using Metropolis algorithm 
with a normal proposal distribution. The mean of 
the proposal distribution was the parameter estimated 
from the previous iteration with a fixed variance (55, 56). 
Two separate chains were run in parallel with a total 
of 150,000 iterations each. A burn-in of 20,000 iterations 
was done and the last 5,000 and 1,000 samples were used 
for posterior inference and prediction, respectively. The 
Gelman-Rubin model diagnostic tool (57) was used 
to assess convergence of chains before summarizing the 
results. The package 'fields' in R was used for selection of 
knots. For practical implementation of the geostatistical 
model 281 knots (2,479 unique locations) were selected 
for the density data (both species), 177 (415 unique 
locations) for SR analysis of A. funestus and 219 (639 
unique locations) for SR of A. gambiae. Predictions and 
calculations of EIR were done in Fortran 95 (Compaq 
Visual Fortran Professional 6.6.0). 




Fig. I. Seasonal variations of (A) rainfall, temperature and (B) mosquitoes densities of A. gambiae and A. funestus in the Rufiji 
DSS October 2001 -September 2004. 
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Table I. Results of association of environment/climate variables on sporozoite rate and mosquito density and spatio-temporal 
parameters 



Sporozoite rate 



Density 



Parameter 



Model: binomial 



Model: zero inflated negative binomial 



AF 



AG 



AF 



AG 



Seasonality 
Constant 
Cos 12 
Sin 12 
Cos 6 
Sin 6 

Environment and climate 
NDVI 
RAIN 
LSTD 
LSTN 

Distance to the water bodies 
Annual trend 

Year 2 

Year 3 
Spatial process 

Range b (in km) c 

Variance <ri 
Temporal process 

Correlation y 

Variance a 2 T 
Other parameters 

Non-spatial variance a 2 

Over-dispersion r 
Covariates on the mixing probability 

Constant 

NDVI 

RAIN 

LSTD 

LSTN 



Median (95% Cl a ) 
0.04 (0.01 , 0.23) 0.07 (0.02, 0.56) 



0.99 (0.41, 2.41) 
0.84 (0.31, 2.53) 
1 .27 (0.66, 2.47) 
0.65 (0.34, 1 .25) 

1 .03 (0.85, 1 .25) 
0.96 (0.73, 1.26) 
2.31 (1.06, 6.97) 

1.04 (0.52, 3.51) 
0.93 (0.76, 1.11) 

1.01 (0.61, 1.67) 
0.41 (0.2, 0.79) 



0.72 (0.29, 1 .66) 
0.54 (0.19, 1.32) 
0.81 (0.44, 1.53) 
0.87 (0.45, 1.68) 

0.93 (0.79, 1.1) 
0.53 (0.36, 0.79) 

0.92 (0.7, 1.22) 
0.96 (0.73, 1.27) 
0.97 (0.85, 1.1) 

0.48 (0.31, 0.75) 
0.37 (0.24, 0.57) 



35.52 (11.1, 78.81) 49.95 (15.54, 81.03) 

0.9(0.37,2.36) 0.45(0.2,1.18) 

0.5 (-0.52, 0.96) 0.5 (-0.51, 0.96) 

0.34 (0.14, 1.11) 0.33 (0.14, 0.94) 



0.31 (0.16, 0.61) 



0.34 (0.19, 0.59) 



Median (95% Cl a 



1.03 (0.33, 2.4) 
1.1 (0.54, 2.3) 
0.75 (0.4, 1 .55) 
0.75 (0.43, 1 .39) 
1.13 (0.58, 2.08) 

1.15 (0.87, 1.6) 
1.33 (1.06, 1.68) 
1 .23 (0.81 , 1 .69) 
1.47 (1.02, 2.02) 

0.96 (0.65, 1.22) 

0.13 (0.08, 0.24) 
0.34 (0.2, 0.61) 

21.1 (12.2, 56.6) 
1 1 .35 (6.58, 29.2) 

-0.15 (-0.79, 0.67) 
0.61 (0.22, 2.59) 

2.88 (1.81, 4.4) 
2.64 (1.7, 3.67) 

0.07 (0.02, 0.21) 
0.3 (0.17, 0.54) 
1.3 (0.84, 5.37) 

0.07 (0.01, 0.64) 

0.53 (0.27, 1.14) 



2.4 (0.53, 4.03) 
0.39 (0.2, 0.86) 

0.6 (0.32, 0.96) 
0.76 (0.41, 1.13) 
0.99 (0.53, 2.43) 

1.11 (0.92, 1.35) 
1.26 (0.97, 1.79) 
0.77 (0.64, 0.89) 

0.84 (0.69, 1 .03) 
0.94 (0.79, 1.11) 

0.17 (0.11, 0.25) 
1.6 (1.04, 2.53) 

15.5 (8.9, 32.19) 
5.04 (3.1, 10.33) 

0.08 (-0.77, 0.83) 
0.51 (0.2, 2.55) 

2.59 (1.89, 3.2) 
1.16 (0.77, 1.61) 

0.13 (0.06, 0.25) 
0.93 (0.7, 1 .29) 
0.65 (0.36, 1 .85) 
0.05 (0.02, 0.18) 
0.71 (0.28, 3.64) 



a Credible Intervals (or posterior intervals). 

b Based on spatial decay parameter, the Range is calculated as 3/p ( x 1 1 1 km). 
c The spatial correlation is significant (>5%) within this distance. 
Bold terms indicate significant variables in the model. 



Results 

Data description 

In total of 2,479 unique locations were visited for 
the collection of the mosquitoes. A total of 15,983 A. 
funestus (from 18% of the surveyed locations, n=447) 
and 17,885 A. gambiae (from 27.3% of the surveyed 
locations, n=678) mosquitoes were captured. About 
83 and 74.3% of the visits for mosquito collection for 
A. funestus and A. gambiae received zero counts. The 



crude annual SRs were 3.3, 2.8 and 3.2% for Year 1 
(October 01 -September 02), Year 2 (October 02-Septem- 
ber 03) and Year 3 (October 03-September 04), respec- 
tively. The crude EIR were 507, 72.8 and 146 infectious 
bites/person/year for 3 years respectively. In Fig. 1, the 
relation between rainfall, temperature and mosquito 
density is shown (data collapsed in a period of one 
calendar year). 

Most A. gambiae mosquitoes were captured during the 
months of April and May while most A. funestus were 
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EIR Prediction Oct 01 EIR Prediction Nov 01 EIR Prediction Dec 01 



(a) 




EIR Prediction Apr 02 EIR Prediction May 02 EIR Prediction June 02 




EIR Prediction Apr 03 EIR Prediction May 03 EIR Prediction June 03 



(c) 




EIR Prediction Apr 04 EIR Prediction May 04 EIR Prediction June 04 



(d) 




Fig. 2. Selected EIR maps showing the spatial distribution and the seasonal pattern, for the period of Oct 2001 -Sept 2004. (A) 
Dry months followed by the period of short rains, (B) Months immediately after the onset of heavy rains during the first year 
(very wet), (C) Months immediately after the onset of heavy rains during the second year (dry) and (D) Months immediately 
after the onset of heavy rain season during the third year (normal rains). 
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collected in the period of July-September. The number 
of A. gambiae collected was higher during the heavy 
rains while short rains with high temperature favour the 
population of A. funestus (Fig. 1). 

Geostatistical model results 

Table 1 summarizes the results of parameter estimations 
from a multivariate geostatistical models on SRs and 
mosquito density. 

The effect of environmental variables differs signifi- 
cantly between species. Rain and temperature are the most 
influencing factors for density and sporozoite with higher 
effect on the A. funestus species. No significant effect of 
distance to the water bodies was obtained, highly pro- 
nounced with a significant decrease of mosquito popula- 
tion in Year 2 as compared to Year 1 and later an increase 
in the Year 3 as compared to Year 2. Spatial ranges are 
quite high especially for the SRs. The estimate of the over- 
dispersion parameter of A. funestus is twice as large as that 
of A. gambiae which could be influenced by the amount of 
zero counts in the data. However, the estimate of r is larger 
than 1 indicating that the data are not highly overdispersed 
(58). Day temperature significantly reduces the probability 
of observing zero mosquito counts. Spatial variability 
accounts more for the total variability in the data as 
compared to the non-spatial and temporal variability. 

For a total of 63, 99, 368 and 368 test locations 
selected for validation of SR-AF, SR-AG, Density-AF 
and Density-AG models respectively, 68.3, 63.6, 84.1 and 
89.9% of the locations were correctly predicted within 
95% credible interval. Gelman-Rubin diagnostics indi- 
cated good convergence of all model parameters. 

Mapping of EIR 

Figure 2 presents selected EIR maps for the Rufiji DSS 
site for the A. funestus and A. gambiae. 



The southern part of the DSS showed high transmis- 
sion throughout the years. High transmission was ob- 
served immediately at the onset of rains, especially during 
the heavy rain period. At the end of the rainy season 
(May-June), the transmission spread throughout the 
region (Fig. 2). 

In Fig. 3, monthly time series (median) predicted EIR 
are plotted for the entire study period. Attributes of each 
species are also indicated. 

The transmission starts peaking in the month of April 
(just after rains) and gradually drops in July (first year of 
the study). There was a reduction in the second year 
of the study and EIR increased again during the last year. 
A similar monthly trend is observed across years, which 
emphasizes seasonality. A. funestus are more prominent 
during the dry months while A. gambiae are more 
prominent during the rainy periods. The spatial temporal 
distribution of year-by-year EIR is shown in Fig. 4 with 
maps of prediction error. The prediction error for the 
EIR estimates was obtained my multiplying the predic- 
tion errors obtained from SR and density models. 

Patterns in Fig. 4 show that few surveyed house- 
holds are located in areas with EIR < 1; however, a large 
proportion of household presented high transmission 
intensity. Higher prediction errors are seen in areas with 
few surveyed locations. The errors also capture the effect 
of heterogeneity arising from unmeasured factors. 

Population-adjusted EIR 

The annual and species-specific population-adjusted EIR 
were calculated by averaging predicted inoculation rates 
at all households (N = 14,516) within the RDSS (Fig. 5) 
excluding all of the other pixels. Results are presented in 
Table 2. 



o 
E 



LU 



An.funestus + An.gambiae 

— An.funestus 

— An.gambiae 




Months 

Fig. 3. Predicted monthly EIR median and attribute of each species in Rufiji DSS. 
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Fig. 4. Spatial temporal distribution of annual EIR with prediction error maps. 
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Table 2. Overall predicted EIR with the percent attribute of 
each species 



Period 


A. funestus 
+A. gambiae 


A. funestus 


A. gambiae 


Year 1 


853.6 


582.9 (68%) 


270.7 (32%) 


Year 2 


113.7 


88.8 (78%) 


24.9 (22%) 


Year 3 


286.1 


107.2 (37%) 


178.9 (63%) 



Overall transmission intensity reduced significantly 
during Year 2 and 3 as compared to Year 1 of the study. 
A. funestus was the main responsible vector for transmis- 
sion in the first (68%) and second (78%) year, while the 
last year transmission was mainly driven by A. gambiae 
(63%). 

In addition, we assessed the spatial shift (distribution) 
of transmission intensity over time, as illustrated in Table 
3. EIR were categorized into five transmission intensities 
which were: no transmission (EIR =0), very low (EIR > 
0.0-1), low (EIR > 1-10), average (EIR > 10-100), and 
high (EIR > 100). The change in the percentage of house- 
holds exposed to a specific level of transmission was then 
studied. 

The proportion of households predicted with very 
low transmission intensity increased between the first 
year and the third year of the study, from 4.0 to 7.2%. 
A significant reduction (over 68%) of locations with high 
transmission is seen during the last year of the study (i.e. 
12.6% in the first year to 4% in the third year). 

Discussion 

In this study, we assessed spatial-temporal variation and 
heterogeneity of malaria transmission in the Rufiji DSS 
site using a large geo-referenced biweekly entomological 
dataset collected over 3 years, and rigorous Bayesian 
geostatistical models. Our work is amongst the few to 
address spatial modelling of Entomology inoculation rate 
(EIR) based on sparse data by applying current Bayesian 
methodologies approximating spatial processes for large 
data. The INDEPTH-MTIMBA data, which was used in 
our application, is the most comprehensive entomological 



database in Africa. Bayesian spatio-temporal binomial 
and zero inflated negative binomial regression models 
were developed to produce monthly maps of EIR taking 
into account the malaria-climate relation and seasonality 
in transmission (35, 36, 59-61). 

Geostatistical models have been widely used in malaria 
mapping in recent years (3, 38, 52, 62-64). Most of these 
analysis involved standard geostatistical models which 
are relevant for a moderate number of locations. Com- 
putation involved in these models is not feasible for data 
collected over a large number of survey locations. In this 
study, we used methods proposed by Barnejee et al. 
(40) and Finley et al. (42) to approximate the spatial 
process using a subset of survey locations selected via 
space filling design implemented in R software. Addi- 
tive temporal correlations with autoregressive structure 
were also incorporated in all models. The predictive 
power of the model suggests good performance of the 
spatial correlation approximated from a subset of ob- 
served location. That might indicate that the subset 
selected was significantly appropriate. This work adds 
to the few in literature that indirect evaluates perfor- 
mance of using subsets to approximate the spatial process 
in real-life field data. 

Changes in climate conditions, natural inhabitants and 
other human activities, which depend on the environ- 
ment, alter the intensity of malaria transmission (21, 65). 
Our results depict temporal and seasonal variation in 
EIR along the study period and study area. Transmission 
was higher during the rainy periods with high tempera- 
tures and very low during the dry season or year. Two 
species A. funestus and A. gambiae are mainly responsible 
for malaria transmission in this region. Differences on 
the effect of environmental factors on the mosquito 
abundance and SRs of the species were observed. The 
population of A. gambiae increases at the onset of heavy 
rains while that of A. funestus peaks during the short 
rains season. Similar results have been reported in the 
Kilombero valley and other areas with similar climate in 
Africa and are associated with the preferential conditions 
of breeding sites of these species (13, 16, 66-70). A study, 
which assessed spatio-temporal variation of EIR in 
Navrongo DSS, showed similar patterns of seasonality 



Table 3. Distribution of predicted EIR over the RDSS area by Year, N* (%) 



Category 


EIR range 


Year 1 , W a (%) 


Year 2, N (%) 


Year 3, N (%) 


No 


0 


4,896 (27.5) 


13,124 (73.8) 


4,225 (23.8) 


Very low 


>0.0-1 


704 (4.0) 


1,320 (7.4) 


1,286 (7.2) 


Low 


>1-10 


4,568 (25.7) 


2,081 (11.7) 


6,779 (38.1) 


Average 


> 10-1 00 


5,377 (30.2) 


1,068 (6.0) 


4,781 (26.9) 


High 


>100 


2,238 (12.6) 


190 (1.1) 


712 (4.0) 



a The number of households within a specific transmission intensities category. 
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that differed by species (28). Highly significant effects 
of temperature on the SR and density of A. funestus were 
observed. Contrary to A. gambiae which has relatively 
exophilic behaviour, this species is strictly endophilic, 
which could facilitate choice of conducive a resting 
environment favouring the gonotrophic cycle resulting 
in higher survival and hence longer infectivity (71-73). 
Knowledge of these characteristics can be important for 
understanding disease dynamics and for efficient imple- 
mentation of interventions (5, 6, 66, 74). 

There was considerable variation over short distances 
in the intensity of transmission. Small-scale variations in 
malaria transmission are common in sub-Saharan Africa 
and create complexity in implementing strategies to 
combat malaria (8, 28, 59, 75-77). The spatial correla- 
tion was still present over a substantial distance and the 
spatial variation comprised of about 90% of the total 
data variance. The spatial correlation arises partly due to 
spatial pattern in environmental drivers of transmission, 
partly due to effects of limited mosquito dispersion, and 
is also affected by human factors such as migration and 
human population densities (41, 42). We had an abun- 
dance of data on both mosquito and human populations; 
however, due to the relative small DSS area, it is difficult 
to separate the contributions of these different factors 
to the spatial correlation, which explains the higher 
spatial range. Such heterogeneity arising from unmea- 
sured factors is captured by the prediction errors. 

The methodology described in this study allows esti- 
mation of EIR while adjusting for both, temporal and 
small area spatial variations in a systematic and thorough 
manner. It acknowledges key characteristic of the data, 
considers computation difficulties and correlation among 
potential drivers of malaria transmission. It could be seen 
that the crude EIR were underestimated as compared 
to model-based estimations by over 55%. This underlines 
the importance of utilizing efficient methodologies while 
estimating epidemiological parameters to allow for proper 
decisions. 

Our formulation allows further expansion and easy 
incorporation of other covariates in the main structure 
of the model either as specific covariates or their interac- 
tion. The complex component of our proposal is how to 
separately model SR and density data, incorporate sea- 
sonality, choosing environment lags and lastly approxi- 
mate the spatial correlation when the large number of 
location has been observed. All these have been worked on. 
Moreover, DSS sites including Rufiji, collected compre- 
hensive records of all-cause and disease mortality in the 
human population at the time of this entomological 
surveillance. The exposure surfaces estimated using this 
approach can be linked to mortality data to assess the 
malaria-specific mortality burden. Through that, much 
more accurate estimates of the benefits to be gained by 
reducing malaria transmission can be estimated than if 



it would have been possible from analyses that aggregate 
EIR over large areas and time periods or those fitted 
assuming normally distributed EIR. 
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